Strategies for improving the efficiency of quantum Monte Carlo calculations 
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We describe a number of strategies for minimizing and calculating accurately the statistical uncer- 
tainty in quantum Monte Carlo calculations. We investigate the impact of the sampling algorithm 
on the efficiency of the variational Monte Carlo method. Our finding that a relative time-step ratio 
of 1 : 4 is optimal in DMC is in agreement with the result of [J. Vrbik and S. M. Rothstein, Intern. 
J. Quantum Chem. 29, 461-468 (1986)]. Finally, we discuss the removal of serial correlation from 
data sets by reblocking, setting out criteria for the choice of block length and quantifying the effects 
of the uncertainty in the estimated correlation length. 
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I. INTRODUCTION 

Quantum Monte Carlo (QMC) methods are a class of 
stochastic ab initio techniques for solving the many-body 
Schrodinger equation [l|, [2| • They arc capable of achiev- 
ing accuracy comparable to that of post-Hartree-Fock 
quantum-chemistry techniques but with a much lower 
computational cost. The diffusion Monte Carlo (DMC) 
method in particular has no close competitors for calcu- 
lations of the energy of bulk periodic systems. 

The utility of QMC stems from the fact that the cost 
of achiev ing a given error bar scales as ^ for typical 
systems [26| , where N is the number of quantum parti- 
cles. The method is most useful when studying systems 
for which quantum-chemistry calculations are infeasible 
and density functional theory does not give a sufficiently 
accurate description of electronic correlation. The algo- 
rithms are intrinsically parallel, allowing QMC to take 
full advantage of developments in computer technology. 
The variational Monte Carlo (VMC) algorithm, for ex- 
ample, is almost perfectly parallelizable. Furthermore, 
existing QMC implementations are easily extended to 
different systems. One may apply the same basic algo- 
rithms, changing only the form of the trial wave function 
and the Hamiltonian, to systems comprising any combi- 
nation of particles and interparticle interactions. Because 
the trial wave function can be an explicit function of in- 
terparticle distances, the Kato cusp conditions and other 
correlation effects can be described compactly, without 
the need for large expansions of many determinants and 
other unwieldy functional forms 0. For a comprehen- 
sive overview of VMC and DMC, the reader is directed 
to Refs. H, i i i. 

The practical challenges facing QMC are largely con- 
cerned with improving the efficiency of the algorithms 
and the design of new trial wave functions. The compu- 
tational expense of a large calculation necessitates care- 
ful selection of the operational parameters. Typically one 



has a certain amount of computer time available within 
which one wishes to achieve the smallest possible statisti- 
cal error in the final result. In addition, the extraction of 
an accurate statistical error bar from serially correlated 
data is itself nontrivial. In this paper, we outline how 
to choose the optimal parameters and algorithms at the 
different stages of a QMC calculation and describe how 
to process the resulting data. 

This paper is structured as follows. Section [ll] gives 
an analysis of the many related factors contributing to 
the efficiency of VMC calculations. Section Hill describes 
how to improve the efficiency of DMC time step extrap- 
olation. In Sec. II VI we discuss the calculation of accurate 
error bars using the reblocking method and describe a 
robust scheme for choosing block lengths. We demon- 
strate in Sec. |V]that uncertainty in the estimated corre- 
lation length results in an error in the statistical error bar 
that can significantly enhance the probability of observ- 
ing outliers. Finally, we draw our conclusions in Sec. I VII 
We use Hartree atomic units {h = \e\ = iric = 47reo = 1) 
throughout this article. 



II. EFFICIENCY OF VMC CALCULATIONS 
A. Method 

In this section, we discuss practical schemes for achiev- 
ing maximal efficiency within the VMC method. We fo- 
cus on three aspects of a VMC calculation. The first 
is the sampling algorithm, which is how moves are pro- 
posed. The second is the use of decorrclation loops, which 
consist of additional moves for which we avoid evaluat- 
ing the local energy. We will demonstrate that decorre- 
lation loops can offer a twofold increase in efficiency. To 
our knowledge, there are no quantitative investigations 
of decorrelation loops in the literature. The third fac- 
tor we consider is the choice of time step, which governs 
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the width of the transition probability density function 
(PDF). Our findings are summarized by the set of rec- 
ommendations in Sec. Ill El 

Variational Monte Carlo is the simplest and least 
computationally expensive QMC method. In the VMC 
method, the expectation value of the Hamiltonian H with 
respect to a trial wave function is calculated using a 
stochastic integration technique, giving a variational es- 
timate for the ground state energy. 
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where £'l(R) = ^'rp^(R)iJ*T(R) is the local energy and 
R is a vector describing all the particle positions. The 
set {i?i}i=i,....n contains n energies and is produced by 
evaluating Ei = £'L(Ri) at n points {Ri;}j;=i....,n in con- 
figuration space distributed according to I^PtIR)!^- 

Due to the finite number of samples n, the VMC es- 
timate of the energy of Eq. ^ has a statistical error 
Ao = cro('^/'^corr)^^^^, whers (Tq is the standard devia- 
tion of the local-energy distribution and ricorr is the cor- 
relation length of the sequence of local energies. 

The quantity co only depends on the system and the 
trial wave function, whereas ricorr also depends on the 
sampling algorithm. Thus, for a given system, trial wave 
function, and sampling algorithm, the statistical error 
diminishes with the number of configurations sampled as 
n~i/2. Suppose one VMC step takes a time Titer- A VMC 
calculation is more efficient the less time it requires to 
reach a given statistical error Aq, so if a VMC run takes 
a CPU time of T = riTitcr to sample n configurations, an 
appropriate measure of its efficiency is 
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which is independent of n. The efficiency of a VMC calcu- 
lation can be improved by reducing the product ricorrTiter- 



B. VMC sampling 

The electronic configurations {Ri}i=i,...,n are gener- 
ated using the Metropolis algorithm Q, where a move 
from Ri to R^ is proposed with probability T(R^ ^ R^), 
and is accepted {i.e., Ri+i = R^) with probability 
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or otherwise rejected (i.e., Ri+i — Ri)- In fact, if the 
wave function can be factorized, one can greatly improve 
efficiency using multi- level sampling All of our cal- 
culations use two-level sampling, in which we accept or 
reject the move first based on the Slater determinant part 
of v]>rp(R) and then (if the Slater part of the move was 
accepted) based on the Jastrow factor [l[ . 



A simple, commonly used choice for T(R^ ^ R^) is 
the product of Gaussian distributions of variance r (stan- 
dard deviation -^t) for each of the Cartesian components 
of the displacement of each electron. By analogy with 
DMC, T is often referred to as the VMC "time step," 
although there is no notion of time in the VMC formal- 
ism. We shall restrict our analysis to the case of Gaus- 
sian transition probabilities. Alternatives to this choice 
have been proposed [§, [13 , but these studies focus on the 
statistical improvement for a given number of iterations, 
and do not analyze the total efficiency. The simplicity 
of the Gaussian distribution represents an efficiency ad- 
vantage that is hard to offset with more exotic distribu- 
tions. Nonetheless, the conclusions presented here should 
mostly be applicable to other transition probabilities. 



1. Configuration-by-configuration and electron-by-electron 
sampling 

In the sampling algorithm we have just described, to 
go from Ri to R^+i we propose an entire configuration 
move, and we accept it or reject it with a single deci- 
sion. This is what we call configuration-by-configuration 
sampling (CBCS). 

However, it is possible to generate R^+i from R^ by 
proposing N successive single-electron moves and accept- 
ing or rejecting each of them individually. The result- 
ing algorithm is electron- by-electron sampling (EBES), 
which allows larger moves to be accepted, greatly re- 
ducing ricorr- This comes at the cost of an increase in 
Titer, because evaluating the N acceptance probabilities 
in EBES takes longer than computing the single accep- 
tance probability in CBCS. 



2. Averaging local energies over proposed moves 

It is possible to replace the average in Eq. ([1]) with 
an expression where the local energies at R'^ and R^ are 
multiplied by the acceptance and rejection probabilities, 
respectively, and summed together. For CBCS, the ex- 
pression is [ll| : 
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This expression is also a valid approximation to the VMC 
energy, with the advantage that rejected moves con- 
tribute to the sum, adding new data and improving the 
statistics, especially when the acceptance ratio is low. 
This translates into a reduction in ncorr- The evalua- 
tion of the additional local energies increases Titer, how- 
ever. We investigate the balance of these factors below 
for CBCS. 

We have avoided averaging the energy over proposed 
moves in EBES since even with refinements it has been 
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found to be less efficient than the unmodified algo- 
rithm [il. 



3. Decorrelation loops 

It is possible to go from R; to Ri+i by proposing p 
configuration moves in turn instead of just one. In this 
scheme one generates a sample of n local energies by per- 
forming a calculation consisting of pn moves and evalu- 
ating the local energy at every pih configuration. 

The cost of one step of a VMC calculation with a decor- 
relation loop of length p is 
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where Tmovc is the time it takes to propose and accept or 
reject a single configuration move and Tonorgy is the time 
it takes to evaluate the local energy [27| . 

It is possible to establish the precise form of the corre- 
lation length Ucorrip) as a function of p. When n — >■ oo, 



,.(1) = 1 + 2VA 



fe=l 



(6) 



where At is the autocorrelation of local energies sepa- 
rated by k steps, 

A, = ^{iE,-{E)m+,-{E))) (7) 

If we assume that the autocorrelation is dominated by a 
single exponential term, i.e., Ak = exp(— afc) then Eq. 
^ becomes 

neon- = 1 + 2 ^ exp(-Q!fc) = 1 + 2 ^ ^ ~, , . (8) 
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Hence cxp(— a) = (jicorr — l)/('^corr + 1)7 and the corre- 
lation length at p is 



C. Automatic optimization of r 

Although the VMC algorithm is valid for any positive 
time step, the efficiency of the method depends strongly 
on T. An appropriate time step for EBES VMC can 
be very roughly estimated as being such that the root- 
mcan-square (RMS) distance moved by each electron at 
each time step is equal to the most important physical 
length scale in the problem. Assuming the acceptance 
probability of electron moves is approximately 50%, the 
RMS distance diffused is ■\/3t/2 in three dimensions. In 
an electron gas the only physical length scale is the radius 
Tg of the sphere that contains one electron on average, so 
the required time step is r sa 2r'^/3. In an atom the 
length scale is somewhere between the Bohr radius 1/Z, 
where Z is the atomic number, and 1 a.u. However, it is 
clear that these crude choices are far from optimal. 

There are two commonly-used approximate methods 
for choosing r; aiming to achieve an acceptance ratio of 
50% (the 50% rule), and maximizing the diffusion con- 
stant. Both can be implemented so that this optimization 
occurs automatically and inexpensively at the beginning 
of a VMC run. 

In the "50% rule" it is assumed that the ratio a of 
accepted moves to proposed moves is representative of 
the sampling efficiency, and that a value of 50% is near- 
optimal. In general, the two limits of 0% and 100% accep- 
tance correspond to a failure to properly explore phase 
space, but there is no particular reason why a = 50% 
should correspond to optimal sampling. 

The diffusion constant D can be computed as the av- 
erage of the squared displacement between consecutive 
configurations and Ri+i [l^. One might reasonably 
assume that choosing r to maximize D is an efficient 
strategy, although maximization of D does not neces- 
sarily correspond to optimal sampling. For example, in 
a CBCS study of the homogeneous electron gas, rigidly 
translating all of the electrons together results in a very 
large diffusion constant but clearly corresponds to poor 
exploration of phase space. 
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which falls off as p^^ if rtcorr is large. From Eqs. ([2|), 
([5]); and ^ we can build the full expression for £, and 
it is possible to find the value of p that maximizes £ 
analytically from estimates of Tmovc, ^energy, and ricoiT- 

The usefulness of decorrelation loops depends on how 
costly it is to evaluate local energies and how much serial 
correlation is present. Were it the case that local energies 
took no time to evaluate {i.e., Tonorgy = 0), the inclusion 
of decorrelation loops would not increase the efficiency £, 
and if no serial correlation were present then ricorr (p) = 1 : 
and increasing p would simply increase the cost of each 
step. 



D. Empirical data and analysis 

We shall consider four basic choices to be made 
when performing a VMC calculation with a Gaussian 
transition-probability density: whether to use CBCS or 
EBES, whether to average local energies over proposed 
moves, the value of the "time step" t, and the length of 
the decorrelation loop p. 

In order to study the effect of these choices, we have 
performed VMC calculations for a set of six representa- 
tive systems: a pseudopotential N atom, an all-electron 
O atom, a pseudopotential NiO molecule, an all-electron 
N2H4 molecule, a three-dimensional homogeneous elec- 
tron gas (HEG) composed of 38 electrons at a den- 
sity parameter of Tg = 1 a.u., and a 16-atom super- 
cell of a pseudopotential C diamond crystal (2^. For 
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each system, we tested two trial wave functions: one of 
Slater- Jastrow form [l|.[l3j and another of Slater- Jastrow- 
backflow form [14, UM ■ 



System 


''"opt 


Popt 


fflopt 


£'50%/£'opt 


^-Dmax/^-opt 


^■p— 1 / ^opt 


N (pp) 


0.20 


3 


55% 


1.00 


0.52 


0.65 


O 


0.05 


3 


58% 


0.94 


0.40 


0.68 


NiO (pp) 


0.20 


5 


44% 


0.94 


0.39 


0.41 


N2H4 


0.05 


3 


64% 


0.62 


0.09 


0.71 


HEG 


1.00 


3 


37% 


0.93 


0.96 


0.73 


Diamond 


1.00 


3 


32% 


0.94 


0.66 


0.60 



TABLE I: Optimal parameters and comparison of diflerent 
efficiencies for EBES using Slater- Jastrow wave functions. 
Pseudopotentials (pp) were used in some of the calculations. 



System 


Topt 


Popt 


Clopt 






£-p— 1 / ^opt 


N (pp) 


0.10 


8 


32% 


0.87 


0.90 


0.33 





0.01 


8 


27% 


0.82 


0.82 


0.64 


NiO (pp) 


0.02 


36 


17% 


0.70 


1.00 


0.13 


N2H4 


0.01 


13 


16% 


0.59 


1.00 


0.42 


HEG 


0.05 


36 


9% 


0.54 


0.88 


0.47 


Diamond 


0.02 


36 


11% 


0.25 


0.79 


0.13 



TABLE IL Optimal parameters and comparison of different 
efficiencies for CBCS using Slater-Jastrow wave functions. 



For each system and wave function, we have performed 
calculations using EBES and CBCS, and for CBCS we 
have run calculations with and without averaging over 
proposed moves. Finally, for each system, wave function, 
and sampling method, we have performed 160 VMC cal- 
culations covering 16 different values of r and 10 different 
values of p. In each case we have identified the maxi- 
mum efficiency Eopt — '''opt, Popt)- To assess the per- 
formance of the "50% rule," we have located the value 
of the time step T5g% whose acceptance ratio is closest 
to 50% and compared the efficiency £50% = i^^(T5o%,Popt) 
with £opt- To assess the performance of maximizing the 
diffusion constant, we have located the value of the time 
step T]j^_^^ with the maximum D and compared the ef- 
ficiency ^D^a^ = £{TD^^^,Popt) with £opt- To assess the 
importance of decorrelation loops, we have compared the 
efficiency £p=i = f (Topt, 1) with Sopt- The results of these 
comparisons are given in Table U for EBES and Table [TT] 
for CBCS, in both cases for the Slater-Jastrow wave func- 
tion only; the data for the Slater- Jastrow-backflow wave 
function are nearly identical and are not shown. 

For the periodic systems the acceptance ratio in EBES 
does not reach zero as r is increased, and as a conse- 
quence the efficiency presents a plateau in that region, 
where we find that £ is close to £opt- In EBES we also 
find that the "50% rule" consistently gives efficiencies 



within 10% of the maximum, with the exception of the 
N2H4 molecule, where the optimal acceptance ratio is 
larger. Maximization of the diffusion constant in EBES 
consistently gives time steps that are too large and yields 
efRciencies below about 50% of the maximum possible 
for finite systems, and between 65% and 95% of the 
maximum for periodic systems. In CBCS, maximizing 
the diffusion constant achieves reasonable efficiencies, of- 
ten within 10% of the maximum value, while the "50% 
rule" gives increasingly poor results as the system size 
increases. Decorrelation loops improve the efficiency in 
EBES by between 50% and 150%. In CBCS these be- 
come more important and enhance £ by up to a factor of 
seven. 



System 



TV 



iSEBEs/i^CBCS £^CBCS/^^CBCS2 

SJ SJB SJ SJB 



N (pp) 
O 

NiO (pp) 
N2H4 
HEG 
Diamond 



16 
18 
38 
64 



1.05 
1.47 
1.65 
1.93 
3.11 
4.70 



0.90 
1.07 
1.22 
0.83 
1.95 
2.36 



1.22 
1.10 
1.38 
1.11 
1.27 
1.14 



1.24 
1.17 
1.52 
1.53 
1.25 
1.24 



TABLE 111: Comparison of the efficiency of EBES and CBCS 
for Slater-Jastrow (SJ) and Slater- Jastrow-backflow (SJB) 
wave functions, and also for averaging local energies over 
proposed moves (CBCS2) and computing a single energies 
(CBCS). 



In Table IIIII we compare the maximum efficiency en- 
countered in EBES ^ebes with that in CBCS £cbcs for 
Slater-Jastrow (SJ) and Slater- Jastrow-backflow (SJB) 
wave functions. The fifth and sixth columns of Table Hill 
show the comparison for CBCS when a single energy is 
evaluated per configuration move (f cbcs ) ■ a-nd where av- 
erages of local energies over proposed moves are carried 

out (fcBCS2)- 

EBES is more efficient in all cases, with the excep- 
tion of the backflow calculations on the pseudopotential 
N atom and the all-electron N2II4 molecule. The im- 
provement in efficiency that EBES offers over CBCS in- 
creases with system size. Averaging energies over pro- 
posed moves is found to be less efficient in every case. 



E. Recommendations 

Our key finding is that decorrelation loops increase the 
efficiency of EBES by roughly a factor of two and that of 
CBCS by much more. One can use the expressions in Sec. 
Ill B 31 to determine the optimal loop length p, although 
in practice a decorrelation period of p = 3 delivers near- 
optimal efficiency in the EBES algorithm for a wide range 
of systems. 

Based on the data presented in Sec. IIIDi we suggest 
that EBES should nearly always be used in VMC, the 
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only possible exception being for small systems with 
fewer than about 20 electrons when backflow is used. 
(Even in this case, CBCS is not much more efficient than 
EBES.) When using EBES, one should use the "50% 
rule" to optimize the time step t. If CBCS is used, one 
should maximize the diffusion constant to optimize the 
time step r. Finally, we find that accumulation methods 
which average local energies over proposed moves are less 
efficient for every system tested. 



Assuming the data are Gaussian-distributed, the square 
of the standard error in the extrapolate Eq is 
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III. OPTIMIZING DMC TIME-STEP 
EXTRAPOLATION 

DMC is a Green's function projector method for solv- 
ing the Schrodingcr equation in imaginary time. In DMC, 
the ground state distribution is represented by the den- 
sity of walkers (points in configuration space) rather than 
by an analytic function. Propagation of a population of 
walkers in imaginary time projects out the ground-state 
component of the initial DMC wave function [l|, [l^ . 

The DMC algorithm is only accurate in the limit of 
small time step r. However, the computational effort 
required to achieve a given error bar goes as l/r, ruling 
out the use of infinitesimal time steps in practice. Hence, 
where high accuracy is required, two or more finite time 
steps {ti} are generally used and the ground-state en- 
ergy is obtained by extrapolating to r = [H [1] . Here 
we explain how the statistical error in a zero-time-step 
extrapolate may be minimized by a judicious choice of 
time steps {r^}, and the prudent deployment of a limited 
total computing time between those time steps. Note: 
since our paper was published, it has been drawn to our 
attention that the principal result obtained in this section 
was previously derived in Ref. \vK 

For sufficiently small r, the DMC energy scales lin- 
early with the time step as E{t) = Eq + kt. Suppose 
we calculate E{t) at R different time steps {t^} in the 
linear-bias regime, where each E(Ti) has an associated 
statistical uncertainty A^. The error bars fall off with 
the time step Ti and the CPU time devoted to the cal- 
culation Ti as A, = C / \/TiTi, where C is a constant. To 
determine the ground-state energy at zero time step i?o, 
we minimize the error of the linear fit. 



2 ^ A [E{t,) -Eq- nnf 
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with respect to k and Eq. Setting 
dy^ /dn = dy^ /dE^ — 0, we obtain 

^ 2EtiEf^ig(r.)r.r,-r.r|(r,~rO 
= ]^ • (,11/ 

E, = l TtTjTiTj{Tj - Tif 



As expected the standard error falls off as the time steps 
{Ti} are increased and as more time {Ti} is dedicated 
to the calculations. However, t should not be increased 
beyond Tmax, the limit of the region in which the bias 
is linear. The effort allocated to the calculations cannot 
be increased indefinitely because one is constrained by 
the total time T = EiLi ^r all of the simulations. 
We now minimize A§ subject to the constraint that T is 
fixed. 

Let us first suppose that we are to perform just R = 2 
simulations. We start by fixing the time steps ti and T2, 
and minimizing Aq with respect to the run lengths in the 
presence of a Lagrange multiplier to constrain the total 
run time T. This yields the optimal simulation durations 

Ti = rr|/V(^f ^ + ^2^^) and T2 = Trf^^/ir^^^ + r|/^). 
This deployment attempts to reduce the error bar on 
the calculation with the smallest time step beyond the 
distribution of effort Ti/T = T2/{ti + T2) that would 
ensure error bars of equal size. Without loss of generality, 
we now assume that T2 > ti , with T2 = Tmax 

pinned near 

the boundary of the linear regime, and we search for the 
optimal time step ti . Using the optimal durations Ti and 
T2, minimization of Ag reveals that the optimal choice 
of time step is ti = r2/4. The corresponding optimal 
physical run times are therefore Ti = 8T/9 and T2 = 
T/9. The full dependence of the final error upon the 
relative time step ti /t2 is shown in Fig. [T] 

Now suppose that more than two time steps are used to 
perform the extrapolation. We find that Aq is minimized 
when all the computational effort is dedicated to the two 
points that arc nearest to having a relative time step of 
4 and have the largest maximum value of t. Computa- 
tional effort should therefore be focused solely on that 
optimal pair as long as the linear regime is well-defined. 
There is thus no advantage to using more than R — 2 
data points. 

Our scheme is the optimal extrapolation procedure 
when the extent of the linear regime is known. The strat- 
egy is thus highly applicable to studies of many similar 
systems where the linear regime can be assumed to be the 
same for multiple runs. For systems where the behavior 
of the time step bias has not been established, one has 
no alternative but to perform multiple runs over a wide 
domain of time steps and determine where the spectrum 
first increases superlinearly. In such cases, one can use 
the RMS distance diffused by an electron over a single 
step as an initial order-of-magnitude estimate for where 
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0.1 0.2 0.3 0.4 0.5 

FIG. 1: (Color online) The uncertainty in the extrapolated 
DMC energy against relative step size, ti/t2- The solid 
line and circles show the uncertainty in the extrapolated re- 
sults obtained with the optimal relative run times [T1/T2 = 
{t2/ti)^^^], and the dashed line and the triangles with the 
effort distributed such that the energies have equally sized er- 
ror bars (T1/T2 = T2/T1). The symbols are DMC data from 
the one-dimensional HEG. The error bars are normalized by 
Armax, the error bar of a DMC run at the upper time step T2 
if all of the computational resources (Ti +T2) were dedicated 
to it. 

the linear regime begins. For all-electron atomic systems, 
for example, one would expect the linear regime to occur 
for time steps less than of the order r = 1/(3Z^), where 
Z is the largest atomic number occurring in the system. 
This choice of time step ensures that the RMS distance 
diffused is equal to one Bohr radius of the largest atom 
under study. For a homogeneous electron gas, where the 
only physically-significant length scale is defined by the 
density, the equivalent time step would be t = {r'^)/d, 
where is the radius of the sphere (circle in 2D) that 
contains one electron on average, and d is the dimension- 
ality. Time step bias is reduced when the modifications 
of Ref. m are made to the DMC Green's function, and 
also when higher-quality wave functions arc used. 

If one has accumulated a significant set of results for 
T < '''max hi determining the extent of the linear regime, 
the prescription for minimizing the error in the extrap- 
olate has the potential to differ from the two-run pro- 
cedure. If one has a large amount of computing time 
remaining after determining Tmax, the two-run approach 
is unchanged. In the event that little computing time 
remains after determining Tmax, one should devote the 
remaining time to the run whose contribution falls the 
quickest with computer time, i.e., the run i with the 
most negative value of dAo/dTi, which may be found 
from Eq. P^ . 

Avoiding higher order fitting functions and using only 
data from within the linear regime for the extrapolation 
is the most robust strategy. Though the formalism here 
can be extended to study higher-order fitting functions, 
finding the appropriate regimes for higher-order terms 
would require a larger amount of computational effort 



and there is a danger of numerical stability and branching 
problems affecting calculations for very large t. Linear 
extrapolation is always an option since the leading-order 
term in the bias is known to be 0(t). 

We highlight the benefits of the two-run extrapola- 
tion procedure with an example calculation on the ID 
HEG. Once the maximum allowed time step Tmax in the 
linear regime had been determined, pairs of runs were 
performed at T2 — Tmax and incrementally smaller time 
steps Ti . The pairs of runs were each performed using the 
same total amount of computing time. The time was dis- 
tributed either to ensure equal-sized error bars or accord- 
ing to the prescription T1/T2 = (t2/ti)^/^ to guarantee 
minimal final extrapolated error. The simulation times 
were sufficient to ensure that the data could be reblocked 
for accurate error estimates. The final extrapolated en- 
ergy estimates all agreed to within the expected uncer- 
tainty, consistent with the assertion that all of the time 
steps are within the linear regime. The results shown in 
Fig. [H highlight that, for the range of T2/T1 tested, there is 
strong agreement between the analytical prediction and 
the DMG results. In particular, the error bar on the ex- 
trapolate with the optimal distribution of effort is clearly 
minimized by the choice T2/T1 = 4. The distribution of 
effort according to T1/T2 = (t2/ti)'^/^ yields a modest 
computational advantage over the choice T1/T2 = T2/T1. 

In summary, to minimize the statistical error bar on 
the DMC energy extrapolated to zero time step, one 
should perform one DMC calculation at the largest time 
step Tmax for which the bias is still linear in the time step 
and a second DMC calculation with time step Tmax/4. 
Eight times as much computational effort should be de- 
voted to the latter calculation as to the former. One 
could use a similar approach to optimize the efficiency of 
extrapolating to infinite population or to infinite system 
size in a QMC study of condensed-matter systems. 



IV. REBLOCKING 

The use of small time steps in DMC results in serially- 
correlated data. For accurate estimates of the statisti- 
cal uncertainties of DMC expectation values, the serial 
correlation must be accounted for. Here, we investigate 
reblocking [isj , which is advantageous due to its compu- 
tational convenience and ease of implementation. We 
propose a scheme for the choice of block length such 
that accurate error bars may be reliably determined when 
an estimate for the correlation length is unavailable and 
must be obtained directly from the data. 

For most random processes used in Monte Carlo meth- 
ods the serial correlation is purely positive, so that 
the standard error (treating all samples as independent) 
should be multiplied by an error factor T^err > 1- Let the 
new estimate of the standard error be A, and let u he n 
divided by the estimated correlation length, i.e., v < n 
measures the estimated effective number of steps. We 
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may express A as 

A = ?7crrVvar[S,]/n ^Ya.i[Ei\/v, (13) 

where var[i?.i] is the sample variance of the n data points 
{Ei} and the error factor r/err is the square root of the 
estimated correlation length [y| . As each step of a QMC 
calculation is associated with a time step r measured in 
physical units, a correlation time in physical units can 
be defined as icon- = T?T-corr (■'')• In the limit r — > 0, the 
integrated correlation time icorr becomes independent of 
T and takes a value characteristic of the system under 
study. 

To estimate jycrr from a set of data points Ei^ there 
are several commonly-uscd approaches: computing the 
correlation length, rcblocking, or using resampling tech- 
niques like the jackknife and bootstrap methods [y, [isl - 
[20j . Here we have focused on the reblocking method be- 
cause it is computationally convenient (and conceptually 
very simple) to apply reblocking continuously as local ob- 
servable data are appended to the stored results, vastly 
reducing memory requirements [2]| . A naive calculation 
of the correlation-corrected statistical error necessitates 
the storage of 0{n) observable values, whereas reblocking 
on-the-fly reduces this to 0[log(n)]. 

Reblocking is a method in which a sequence of n se- 
rially correlated data points is divided into contiguous 
blocks of length B, and the raw data are averaged within 
each of these blocks, defining a new data set of length 
n/B. The naive variance of the rcblocked estimate of the 
mean is larger than that of the original data, although the 
mean itself is unchanged. The estimated error initially 
increases with B, reaching a plateau once the serial cor- 
relation has approximately been removed from the data. 
When B approaches n, the plot becomes very noisy due 
to the small number of blocks. 

The reblocking analysis of a typical DMC run is shown 
in Fig. [5J The fundamental difficulty in interpreting this 
kind of data is the choice of an appropriate block size. 
In the case presented here, the run time of 900000 time 
steps was sufficiently long to form a clear plateau in the 
reblock plot. However, individually inspecting the re- 
blocked data of each calculation to make a choice by eye 
is neither objective nor efficient. Tabic HVl shows the es- 
timated correlation lengths from rcblocking the Li data 
with different block lengths. 

A simple yet robust algorithm for automatically choos- 
ing the best block size is as follows. Following Ref. H, the 
block size 

Bopt = ^2nn2^„, (14) 

offers an appropriate balance between the systematic and 
the statistical error in the estimate of the standard error 
for any set of n data points with the integrated correla- 
tion length TT-corr- If a good estimate for ricorr is available 
before the data are analyzed, it is best to use this and 
thereby make the choice of the block size independent of 
the statistical data themselves. In many studies, several 
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FIG. 2: (Color online) Reblocking analysis of a typical DMC 
run (Li atom, with r = 0.01 a.u. and 900000 time steps). The 
optimal block size is chosen by the algorithm described in the 
text. 
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1 


1.762(4) 


2 


3.140(9) 


3 


5.51(2) 


4 


9.34(5) 


5 


14.7(1) 


6 


21.0(2) 


7 


28.9(5) 
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34.0(8) 


9 


36(1) 


10 


39(2) 


11 


40(3) 


12 


37(3) 


13 


42(6) 


14 


44(8) 



TABLE IV: The estimated correlation length found from 
reblocking DMC data with block size B. The system was the 
Li atom with t = 0.01 a.u. and 900000 time steps. 



runs on similar systems are needed or the knowledge of 
tcorr can be used to extrapolate ncorr to small time steps. 
In such cases it is best to estimate ncorr once and reuse 
it for the choice of -Bopt in subsequent calculations, pro- 
vided that the physical system and wave-function quality 
(and thus the correlation length) are unchanged. The er- 
ror factor ryorr obtained in each case can then be used to 
double-check the transferability of the estimated correla- 
tion length without influencing the choice of i?opt , so that 
there is no bias from manually making a data-dependent 
choice. 

If an independent estimate for ricorr is not available, it 
has to be obtained from the analyzed data themselves. In 
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this case, the estimated correlation length rj^^^ depends 
on the choice of B, so the condition for Bopt becomes 
recursive. We consider block sizes that arc powers of 2 
and start with the largest block size possible, decreasing 
B and examining the error factor. The optimal block size 
is then the last value of B for which the inequality B^ > 
2nr]'^j.j.{B) is satisfied. We may restrict B to powers of 2 
since the block length is expected to be logarithmically 
distributed. 

In a reblocking analysis for n data points, the relative 
error in the error factor for a given block size depends 
only on the number of blocks as 

SjerriB)^ /Z ( ^ K) 

VcrriB) V2n- ^ 

Assuming that a user would typically expect at least one 
significant digit in the standard error, we can further de- 
fine a straightforward criterion for the success of a re- 
blocking analysis: if B^pt < n/50, the analysis can be ac- 
cepted as successful, otherwise the reliability of the result 
is questionable and one should gather more data. Except 
for systems with distinct correlation times at extremely 
different scales, this criterion is expected to be reliable 
in all typical cases occurring in QMC. More than one 
correlation time might occur in weakly bound molecules; 
the longest correlation time is defined by the size of the 
molecule and the shortest is determined by the Bohr ra- 
dius of the nucleus with the highest atomic number. In 
such cases, it may be necessary to accumulate more data; 
the block size should clearly be determined by the longest 
correlation length. 

In summary, when using reblocking to remove serial 
correlation from QMC data, one should ideally obtain 
an accurate estimate of the correlation length separate 
from the data being analyzed and use Bopt = v^^'lorr 
to determine the block length . If this is not possible, 
one should aim to satisfy the inequalities B^ > 2nri^^^.{B) 
and Bopt < n/50 for a reliable and accurate estimate of 
the error. 

All methods of accurately calculating the error bar 
from serially-correlated data implicitly estimate the cor- 
relation length. The noise and associated uncertainty in 
estimates of the correlation length introduce error into 
the estimated statistical error bar. In the next section 
we describe how this can increase the apparent number 
of outlying results. 



V. OUTLIERS IN QMC RESULTS 

A. Introduction 

In this section, we investigate the frequency with which 
"outliers" occur in QMC results. We define an outlier as 
a result located more than a given number of estimated 
error bars from the underlying mean value. For example, 
one may fit a straight line to DMC energies at small r. 



If there are sufficient data points, the linear fit is a good 
estimate of the underlying mean; one would usually ex- 
pect, by the central limit theorem (CLT), a fraction 0.32 
of the points to deviate from the fitted function by more 
than a single error bar. Here we address the observation 
that QMC estimates can lie outside statistical error bars 
of the underlying mean more often than one would expect 
were the error bars correctly describing the width of an 
underlying Gaussian distribution. We will demonstrate 
that uncertainty in the estimated correlation length is 
largely responsible for this effect. 

We begin with direct observation of the numbers of 
outliers for two systems, the C atom and the Si crystal. 
By performing a large number of short VMC calculations 
for each system, we count directly the number of energies 
occurring more than Q error bars from the underlying 
mean, where the error is estimated separately for each 
run. Each estimate of the statistical error is also implic- 
itly an estimate of the correlation length, as described by 
Eq. (USD. 

To complement the direct approach, we then derive an 
analytic expression for the fraction of points expected to 
lie more than Q error bars from the mean under the as- 
sumption that the distribution of local energies is Gaus- 
sian. The resulting expression depends on the distribu- 
tion of estimated correlation lengths. Finally, we com- 
pare the expected result from this purely Gaussian model 
process with that found earlier from VMC, forming con- 
clusions about the validity of the Gaussian assumption 
and the origin of outliers. 



B. VMC calculations 

We have performed a large number of VMC calcu- 
lations for two typical systems; the all-electron carbon 
atom and a periodic crystalline silicon system. For the C 
atom we performed 5 x 10'*, 2 x 10^, and 10^ calculations 
of length 200, 500, and 1000 steps, respectively. The Si 
system used a periodic simulation cell containing 54 sil- 
icon atoms, where the ls^2s^2p^ electrons are described 
by pseudopotentials. For the Si system, we performed 
1.5 X 10^ 7.5 X 10^, and 3 x 10"* calculations of length 
100, 200, and 500 steps, respectively. 

A short calculation yields an energy and estimated 
error. From the data we estimate the probability 
P [SE > QA) of observing a VMC energy _E at a po- 
sition more than QA from the true mean Eq, where 
SE = \E—Eo\ and A is the estimated error bar, itself also 
a random variable. The underlying mean Eq is calculated 
accurately using a much longer run. If the error bars ex- 
actly described the width of an underlying Gaussian dis- 
tribution, one would expect P {SE > QA) = erfc(Q/V2). 
The symbols in Figs. H] and [5] show the deviation of the 
VMC results from this ideal case. 

By estimating the statistical error bar for each run, we 
are able to estimate jt-md i which is the distribution of the 
estimated effective number of steps v = n/rj^^^, where n 
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FIG. 3: (Color online) Distribution oi v — n/rf^rr from per- 
forming 5 X 10'* all-electron VMC calculations for the C atom. 
Each calculation consisted of ?i = 200 steps and the error fac- 
tors were obtained by reblocking. The dashed lines show the 
accurate effective number of steps, vo, and the effective num- 
ber of steps corresponding to no serial correlation, v — n. 



is the number of VMC steps and 77orr is the error fac- 
tor of Eq. (jl3p . An example kernel estimate of pind is 
shown in Fig. [31 one can see that i' is occasionally larger 
than n. This is clearly unphysical, stemming from noise 
in the estimate of the correlation length, and results in 
underestimation of the statistical error bar. The distri- 
bution Pind appears to decay at large v as where A 
is between 4.5 and 6.5. 



C. Gaussian model 

We now attempt to replace VMC sampling with an 
ideal process where the underlying distributions are 
Gaussian. Our starting point is the distribution of lo- 
cal energies, pioo from which energies are drawn at suc- 
cessive points along the random walk in configuration 
space. The quantity of interest is again the probability 
P [SE > QA) of observing a sample mean energy i? at a 
position more than QA from the true mean Eq. 

Let us assume that the distribution of local energies is 
Gaussian, 



Pioc {El 



27rcro 



exp 



<EL~Eof 
2al 



The statistical error bar on E is calculated from the same 
set of local energies as the estimate itself. However, since 
estimates of the correlation length are subject to noise, 
there is uncertainty in the effective number of indepen- 
dent samples. Although this leaves E unaffected, it does 
influence the estimated error. As before, we define v as 
the random estimate of va and again refer to the PDF 
Pind from which v is drawn. 

It is well-known that a sum of squares of normally- 
distributed random numbers follows the chi-square dis- 
tribution [l^. Since the error bar A is related to the 
sample variance through Eq. (jl3p . we can write down 



the bivariate PDF por^ for A and v, 



Pcrr(A, 



■ exp 



2^. 



Pind(l^) 



(^) ' 2-^r(^) 



(18) 



where A is only allowed to take positive values and F is 
the Gamma function. It is straightforward to find analyt- 
ically the probability of observing an energy more than 
Q error bars from the mean as a function of Q and A. 
This is done by integrating Eq. ([TTl) . 

2 r dE pavc(^) = erfc (—\[^) ■ (19) 
Jeo+qa V o-q V ^ / 

To find the desired probability, P {SE > QA) , we evalu- 
ate the expectation value of Eq. with respect to the 
distribution of A and v, 



poo poo 

P{6E>QA) = Av j dApcrr(A,zy) 
J2 Jo 




(16) 



(20) 



where we have used the fact that the sample mean and 
sample variance are independent for Gaussian distributed 
random variables [2^, [2^ . To evaluate the integral of Eq. 
pop , we require the distribution pind and an accurate 
estimate of the true effective number of steps, i^o- We 
will take these quantities from the VMC results of Sec. 
IV Bl so that the integral of Eq. pO)) represents an ideal 
Gaussian process accompanied by the uncertainty in the 
number of independent samples (and thus the correlation 
length) that we observe in VMC. The integral of Eq. (^0)) 
can then be evaluated numerically. 



where <7q is the variance of the distribution. Consider 
drawing n samples from the PDF of Eq. 

using the Metropolis algorithm; this yields vq < n inde- 
pendent samples due to serial correlation. For this simple 
case the sample mean, E ~ 
tribution 



-lEi, has the dis- 



PavoC-E) 



exp 



-(^-^0) 

2cr2/i/„ 



(17) 



D. Results 

Figures |3] and [S] show the actual fractions of outliers 
from the VMC calculations compared with those pre- 
dicted by Eq. pO)). which used pind and vq from the 
VMC calculations but otherwise assumed a model Gaus- 
sian process. The fraction of points occurring more than 
Q error bars from the mean has been offset by erfc((5/\/2) 
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FIG. 4: (Color online) Enhancement to the probability of 
observing an energy more than Q error bars from the mean 
for 54-atom (216-electron) bulk Si. The square, circular and 
triangular symbols show the results of VMC calculations of 
n = 100, 200 and 500 local energies, respectively. The number 
of calculations for each set was (1.5 X 10'')/n. The lines show 

where and 



the results of evaluating the integral of Eq. pOl 
Pind were determined from the VMC data. 
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FIG. 5: (Color online) Enhancement to the probability of 
observing an energy more than Q error bars from the mean for 
the C atom. The circles, squares and triangles represent all- 
electron VMC calculations with n = 200, 500, and 1000 local 
energies, respectively. The number of calculations for each set 
was 10^ /n. The lines represent the results of evaluating the 
integral of Eq. (|20[l. where vq and Pind were determined from 
the VMC data. 



ing the sampling efficiency. In the case of the C atom, in- 
stead of the 0.13 probability of observing an energy more 
than 1.5 error bars from the mean that one would expect 
on the basis of Gaussian statistics, the VMC results are 
consistent vi^itli a 0.25 probability (for runs of 200 local 
energies). For the C and Si systems, estimating the er- 
ror bars for each short run using a single more accurate 
estimate of the correlation length (from a single longer 
run or by averaging the estimates from each shorter run) , 
results in a return to P (^SE > QA) = erfc(Q/-\/2). 

For systems exhibiting singularities in the local energy, 
the CLT converges only very slovifly and one might ex- 
pect the non-Gaussian character of pel to play a role 
in determining the frequency with which outliers are ob- 
served [25} . Singularities in the local energy arise when 
the description of the wave- function nodes is inexact, as 
is the case for the C and Si systems considered here, and 
when the cusp conditions are unfulfilled. 

We find that the contribution from the non-Gaussian 
parts of the energy PDF towards the frequency of out- 
liers is statistically insignificant. The evidence for this 
is twofold; first, the integrals based on a purely Gaus- 
sian pioc agree very well with the VMC data, suggesting 
that uncertainty in the correlation length is almost solely 
responsible for the effect. Secondly, attempting to fit a 
function with power law tails (of the form suggested in 
Rcf. [25|) to the VMC energies yields very small values 
for the weight under the tails (usually within error bars 
of zero) , even though the distribution of local energies is 
itself manifestly non-Gaussian (soj . 

In conclusion, when there is too little data to make an 
accurate estimate of the correlation length, the estimated 
error is subject to an uncertainty that increases the prob- 
ability of observing outliers. For isolated calculations of 
a single run, the problem amounts to the gathering of 
sufficient data for an accurate estimate of the correlation 
length. Where dependence upon several parameters is 
being investigated for large systems, one should calcu- 
late accurately the correlation length from a single long 
run or by averaging many estimates from shorter runs. 
The accurate estimate of the correlation length can then 
be interpreted as the square of the error factor, ryorr, and 
used to calculate the error bars on related calculations 
in two ways: either by guiding the choice of block length 
{B^ = 2nr/*^^) or by multiplying the unreblocked error by 
77orr; the two estimates should be roughly consistent. 



VI. CONCLUSIONS 



in the figures to highlight the deviation from the re- 
sult when the correlation length is known exactly, i.e., 
Pind(l^) = 5{iy - Vq). 

When n takes smaller values, the uncertainty in the 
correlation length is greater and the fraction of points 
which may be classified as outliers is larger. A poor trial 
wave function could also contribute to the effect by rcduc- 



In this paper we have developed and carefully tested 
new ways of improving the efficiency of QMC calcula- 
tions. 

Our analysis of VMC efficiency shows that the use of 
decorrelation loops approximately doubles the efficiency 
of EBES, with a loop of three moves providing the great- 
est benefit for a wide range of systems. The improvement 
in efficiency for CBCS is much greater. However, we find 
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that EBES rather than CBCS yields a higher efficiency, 
except in small systems where backflow transformations 
arc used. Of the automatic schemes for optimizing the 
time step that we have considered, attempting to achieve 
a move acceptance ratio of 50% leads to the greatest ef- 
ficiency within EBES. 

For the extrapolation of DMC energies to zero time 
step there is a clear optimal strategy. One must first 
find the largest time step T2 for which the energy can be 
considered to vary linearly with time step. One should 
then minimize the error in the extrapolate by performing 
calculations at two different time steps; the first at ti = 
T2/4 with computational effort 8T/9, and the second at 
T2 with computational effort T/Q, where T is the total 
computing time available. 

The reblocking method of removing serial correlation 
from QMC data offers a significant computational advan- 
tage over other methods. Ideally, when choosing a block 
size, one should estimate the correlation length for a sys- 
tem independently of the serially-correlated data them- 
selves. The optimal block length B should be chosen such 
that > 2nT]^j.j. and B < n/50 [where n is the number 
of data points and r^err is the error factor of Eq. p^ ]. 
This allows automated data processing with a warning 
criterion for insufficient data that works reliably in the 
absence of multiple correlation periods occurring on dis- 
tinctly different scales. 

Finally, we note that uncertainty in the correlation 
length leads to estimated error bars that have the po- 
tential to increase the probability of observing outliers in 
QMC results. The size of the effect is dependent on the 



system and wave function. One can alleviate the prob- 
lem by calculating the statistical error using an accurate 
estimate of the correlation length from a longer run. Oth- 
erwise, our findings highlight the importance of sufficient 
statistics-gathering and caution when interpreting DMC 
results for large systems. 

Quantum Monte Carlo techniques are not as widely 
used as other methods due to their computational ex- 
pense and the complexity of carrying out a calculation. 
In addition to improving the statistical and computa- 
tional efficiency of QMC calculations, the strategies we 
have described are straightforward to automate. With 
the implementation of such schemes, QMC has the po- 
tential to evolve into a true black-box tool. This will 
facilitate wider use of the method and improve its relia- 
bility. 
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those for which the energy is evaluated. 
[29] Our calculations were performed on a cluster of eight 
24GB, dual-socket, quad-core, 2.66GHz Intel Core 17 pro- 
cessors. However, we have only quoted ratios of efficien- 
cies in this paper, which should be largely architecture- 
independent. 

[30] We form a biased estimate for the weight of the power- 
law tails by fitting Eq. (48) of Ref. [2^ to the distribution 
of energies obtained from 10"* VMC runs, each of 1000 
steps. We find A3 = 1.1(8) and A3 = 0.2(4) for the C 
atom and the bulk Si system, respectively. The error 
in the fit was 0.95 per data point for the C atom and 1.03 
per data point for the bulk Si system 



